An OLAP Requirements Example: CompSales International (part 8) - Aggregating Data Within the Cube

12/14/2010 3:20:55 PM

Populating the Cube with Data

Now you can process actual data into your cube from the data source view. To do so, you right-click the Comp Sales cube entry in the Solution Explorer and choose the Process item or choose the Process icon for the cube in the cube designer (second icon from the left in the cube designer). A Process Cube dialog appears, with the object list of available cubes to process. You select the Comp Sales cube (by highlighting it) and then click the Run button to start the processing of data (see Figure 51.38). You can also see in Figure 33 that the Process Option defaults to Process Full. Other options here vary depending on what part of the cube needs to be reprocessed (such as when you have structure changes, data refreshes, incremental data changes, so on).

Figure 33. Process Cube dialog for Comp Sales.

A Process Progress dialog appears as the processing begins. Remember that this data is the dimension member values and the measure data values and has not been aggregated up through a complete cube representation (at all levels in the hierarchies). That will be done shortly, via the Aggregation Design Wizard. You can actually use your cube right now, but browsing would be challenging from a performance point of view.

Aggregating Data Within the Cube

The last step of creating your OLAP cube is running through the Aggregation Design Wizard and determining how best to represent and aggregate the data for your users. This is point at which you must determine the optimal aggregation levels and storage method for these aggregations (MOLAP, HOLAP, or ROLAP) for the optimal performance of queries against the cube.

You double-click the cube entry in the Solution Explorer (Comp Sales.cube) to bring up the cube designer for your newly created cube. Then you click the Partitions tab to see the current partition for Comp Sales. Figure 34 shows the default storage mode is MOLAP and that there is no Aggregation Design for this cube yet. Just to the lower right of this tab is the Storage Settings option, which shows the different storage options possible for your partition, as shown in Figure 35.

Figure 34. The Partitions for the Comp Sales cube.

Figure 35. Specifying MOLAP storage mode for your cube in the Storage Settings dialog.

You need to indicate what type of storage mode and caching options you want for the partition that will contain your aggregations. You want to optimize performance and don’t need real-time refreshes of the data. For these reasons, you specify the MOLAP (native SSAS storage) mode. Figure 35 shows this MOLAP specification in the Storage Settings dialog. This dialog works as a sliding scale. You just need to make sure the slider is positioned at the MOLAP storage option.

You also want to take advantage of the proactive caching capabilities that come with SSAS. You can activate this feature by clicking the Options button of this dialog and then checking the Enable Proactive Caching check box at the top of the Storage Options dialog that appears (see Figure 36 ). In addition, you use the option Update the Cache When Data Changes, as indicated in Figure 36 along with interval times for these refreshes.

Figure 36. Enabling proactive caching for the cube.

A good rule of thumb is to refresh the cache interval based on response requirements and the volatility of the data from the data source views and whether the changes will have a dramatic effect on the BI query results.

Now you can run through the Aggregation Design Wizard to see whether you can optimize your partition for querying. You simply go to the Aggregation Design tab for this cube (from the cube designer) and choose the Design Aggregations option (click the first icon in the Design Aggregations tab or right-click within the Aggregation Design tab and choose Design Aggregations). This launches the Design Aggregation Wizard.

First up is the dialog that allows you to specify object counts of the total population of facts and the number of values at each hierarchical level within each dimension. If you know what the full extent of counts will be for your cube, you can manually supply these count values in the Estimated Count column (see Figure 37 ). You typically do this when you have been able to load only a partial amount of data or the data will grow quite rapidly over time. If you are building a statically sized cube and have populated the data already, you just click the Count button to tell the wizard to use the actual data as the basis of the aggregation.

Figure 37. Specifying cube object counts for aggregation in the Aggregation Design Wizard.

The next dialog optimizes the storage, based on the level of aggregation. You can specify a maximum storage approach (you create optimized storage based on the amount of disk space you can allocate to the cube), tell the wizard to simply optimize to achieve a certain percentage of performance gain (for example, 50%, 80%), specify to start the aggregation design process dynamically, and stop when you feel the cube is optimized enough, or do no design aggregation at all. You really want to see the design aggregation process happen. Remember that the higher the performance you want, the more storage it will require (and the longer it will take to reprocess the aggregations). As you can see in Figure 38 , you should select the I Click Stop option and stop the design aggregation when the optimization level starts to level off (somewhere between 75% to 88% optimization level). Any further optimization would really just waste storage space.

Figure 38. Setting the optimal storage and query performance level in the Aggregation Design Wizard.

When you are satisfied with the aggregation design, you simply click NEXT and name this design (the sample is named AggregationDesignPrimary, as you can see in Figure 39). You then assign this design aggregation to the partition to use in the Partition tab.

Figure 39. Resulting aggregation design to be assigned to the Comp Sales Factoid partition.

If your company has sales transaction data for the past five years and 250 stores that sell an average of 1,000 items per day, the fact table will have 456,500,000 rows. This is obviously a challenge in terms of disk space by itself, without aggregation tables to go along with it. The control that SSAS provides here is important in balancing storage and retrieval speed (that is, performance versus size). Aggregations are built to optimize rollup operations so that higher levels of aggregation are easily derived from the existing aggregations to satisfy broader queries. If a high degree of query optimization weren’t possible due to limitations in storage space, SSAS might choose to build aggregates of monthly or quarterly data only. If a user queried the cube for yearly or multiyear data, those aggregations would be created dynamically from the highest level of pre-aggregated data. With disk storage becoming more and more inexpensive and servers becoming more powerful, the tendency is to opt for meeting performance gains. A recommended approach is to specify between an 80% and 90% performance gain here.

You are now ready to complete the Aggregation Design Wizard. The final step is to either process this aggregation or save your results and process it later. You should choose to process this aggregation now and then click Finish (see Figure 40 ). The Process Progress dialog appears immediately, and you get to watch the full extent of the cube’s aggregation partitions being built (that is, populated). Aggregation SQL queries are actually created under the covers to populate all these aggregation levels (which are implementing your design levels). It’s nice to have Microsoft dynamically create these complex queries for this critical performance optimization step so you don’t have to worry about it yourself.